ZaliQL: A SQL-Based Framework for Drawing Causal Inference from Big Data

نویسندگان

  • Babak Salimi
  • Dan Suciu
چکیده

Causal inference from observational data is a subject of active research and development in statistics and computer science. Many toolkits have been developed for this purpose that depends on statistical software. However, these toolkits do not scale to large datasets. In this paper we describe a suite of techniques for expressing causal inference tasks from observational data in SQL. This suite supports the state-ofthe-art methods for causal inference and run at scale within a database engine. In addition, we introduce several optimization techniques that significantly speedup causal inference, both in the online and offline setting. We evaluate the quality and performance of our techniques by experiments of real datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ZaliQL: Causal Inference from Observational Data at Scale

Causal inference from observational data is a subject of active research and development in statistics and computer science. Many statistical software packages have been developed for this purpose. However, these toolkits do not scale to large datasets. We propose and demonstrate ZaliQL: a SQL-based framework for drawing causal inference from observational data. ZaliQL supports the state-of-the...

متن کامل

Causal inference from big data: Theoretical foundations and the data-fusion problem

We review concepts, principles, and tools that unify current approaches to causal analysis, and attend to new challenges presented by big data. In particular, we address the problem of data-fusion – piecing together multiple datasets collected under heterogeneous conditions (i.e., different populations, regimes, and sampling methods) so as to obtain valid answers to queries of interest. The ava...

متن کامل

2016 Olympic Games on Twitter: Sentiment Analysis of Sports Fans Tweets using Big Data Framework

Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for ...

متن کامل

A Framework for Inferring Causality from Multi-Relational Observational Data using Conditional Independence

The study of causality or causal inference – how much a given treatment causally affects a given outcome in a population – goes way beyond correlation or association analysis of variables, and is critical in making sound data driven decisions and policies in a multitude of applications. The gold standard in causal inference is performing controlled experiments, which often is not possible due t...

متن کامل

Intensional RDB Manifesto: a Unifying NewSQL Model for Flexible Big Data

In this paper we present a new family of Intensional RDBs (IRDBs) which extends the traditional RDBs with the Big Data and flexible and ’Open schema’ features, able to preserve the user-defined relational database schemas and all preexisting user’s applications containing the SQL statements for a deployment of such a relational data. The standard RDB data is parsed into an internal vector key/v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1609.03540  شماره 

صفحات  -

تاریخ انتشار 2016